ReadMe File

Process:  Simulation Tests for Performance Improvements of LCPM

General Comments:

The simulation tests for the paper were executed in both a Windows 11 environment and an AWS Cluster with a Centos nodes.
For simplicity, the provided scripts are win11 scripts.
 
Installation:

The zip file should be unzipped into a directory considered the base directory.
The unzipped base directory will have the following subdirectories:

- data
- packages
- scripts
- source
  - cpp
  - R 


The test.bat (in ~\lcpm-test\scripts) must be edited to update the base_dir and any other references in the file to correctly refer to the path the zip was installed under.

There is an R package tar file under the package directory. This package must be installed to run test case G.

The following packages must be installed in the R environment:

- library(plyr)
- library(profvis)
- library(numDeriv)
- library(Matrix)
- library(alabama)
- library(rapportools)
- library(Rcpp)
- library(rlog)
- library(alabama)
- library(loglikrad98)  package = loglikrad98 tar is in package directory


Running Simulations
   
1. The simulations are run by executing the Windows test.bat file in the ~/scripts directory in a Win command prompt. 

The test.bat file reads in a csv file (stored in the data directory) that contains multiple rows. Each row represents a simulation test case with specified conditions. 
Each row specifies the specific test case being executed, the number of replicates to run, the sample size of each replicate, 
the number of covariates to generate, and the random seed to use.

There are three sample csv files in the data directory. One has a single row to test the install (7). The second has a single row for the Simulation-2c.R case (6)(see later). The third has a full set of test cases (1).

2. The bat file runs Simulation_stub.R that setups the simulation and calls one of two possible R scripts (dependent on what is commented and uncommented):

	a) Simulation-2c.R	Which is used for cases where you want 2 covariates that is used for the final table in the paper.
	a) Simulation-10c.R	Which is used for cases where you want greater than 2 covariates and most diagrams in paper.

   The logic for calculating the data is different in these two scripts. The csv file with the test cases criteria should match this condition. (1.csv vs 6.scsv)	

3. Each script listed in 2) executes the test case by :
	a) generating the appropriate test data, 
	b) runs the LCPM logic including calls to both optimizers
	c) provides the appropriate objective function code
	d) captures runtime performance statistics

4.  Steps 2 and 3 are repeated until all rows (test cases) are executed.

5. Outputs are placed in the ~/lcpm-test/output directory. One output file is placed in this directory. The other with the performance data is stored in the ~/lcpm-test/output/csv. All output files are named with their test conditions in their file name, as well as run date. I have left some old output files in these directories as examples of past executions.


Precautions:

The test cases with greater covariates (16+) and large sample sizes (500,000+), and large repetitions (> 3) can be time-consuming. These tests were run on an AWS cluster where each case was run individually on a node allowing us to diminish overall run times. The supplied file only generates 3 repetitions. You may want to execute subsets of these tests at a time by managing the rows on the csv file.  


Sincerely,

Roland DePratti
roland.depratti@my.ccsu.edu